VAST 2012 Challenge
Mini-Challenge 1: Bank of Money Enterprise: Cyber Situation Awareness

 

 

Team Members:

Victor Yingjie Chen, Computer Graphics Technology, Purdue University, chen489@purdue.edu

Ahmad M Razip, Electrical and Computing Engineering, Purdue University, mohammea@purdue.edu

Sungahn Ko, Electrical and Computing Engineering, Purdue University, ko@purdue.edu

Cheryl Zhenyu Qian, Interaction Design, Purdue University, qianz@purdue.edu

David S. Ebert, Electrical and Computing Engineering, Purdue University, ebertd@purdue.edu

 

Student Team: No

 

Tool(s):

SemanticPrism is developed at Purdue University to visually analyze geospatial-temporal information from with different levels of semantic zooming. It is a web application built by Adobe Flash, PHP, and MySQL. Users can work on it using most web browsers available today.

Before system development, we aggregated the datasets and considered many possible visualization methods. To increase data querying performance from the huge raw data, we created several additional tables to index and aggregate data. For example, the number of computers that match certain criteria (e.g., policy, office, and time) are pre-aggregated. With these data transformation, the performance of drawing curves, generating heatmaps, and allocating computers on the map has been significantly increased. The provided geo-temporal data has many dimensions: apart from geographic location and time, it also shows different kinds of activities, policies, and machine types. It is impossible to visualize all this information within one type of visualization technique.  Also, for such a big and complex organization, the user should not only analyze the world as a whole, but also allocate, narrow down and investigate individual computers.

To meet these requirements, we developed the VA system SemanticPrism with three main components: geo-temporal visualizations of computers and activities, time series curve graphs of policy violations and activity changes, and pixel visualizations of IP blocks on policies, activities, and numbers of connections. All three of these components are interlinked. Each of them also has two to four levels of semantic zooming.

Video:

vast12.wmv

 

 

Answers to Mini-Challenge 1 Questions:

 

MC 1.1  Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe? 

 

Description: Q1-ACT-MERGE

Figure 1. Areas of concern by KDE heatmap based on Activities 2, 3, 4 and 5.

 

1) KDE (Kernel Density Estimation) heatmap represents areas of concern (Figure 1 a~d) based on activities.

 

 

Figure 2. Areas of concern based on Policy 4 & Policy 5.

 

 

2-1) Offices with machines violating a policy are represented on the map with blinking squares. The size of a square becomes larger when the number of the violating machines increases. For example, the squares in Figure 2 (a) and (b) present some of the offices such as HQ DC2 whose machines reported Policy 4 and Policy 5 violations at the given time.

 

2-2) The statusBar view in Figure 2 (c) represents the status change of machines in data center 2. In the statusBar view, the policy of machines are represented by different hues of red. The darkest red means the machine is violating Policy 5, virus infection, as pointed by arrows in Figure 2 (d). The machine 172.2.194.20 is the first computer violates policy 5 at 12:45pm.

 

Figure 3. Other areas of concern.

 

3-1) The different policies reported by all machines are shown in Figure 3 (a) with different colors (e.g., green for Policy 3, red for Policy 5). At 2pm pointed by "Time line", there were many machines reporting of Policy 3, 4, and 5. These regions need to be monitored.

 

3-2) Figure 3 (b) points out that offices in lower part of region 25 are offline (possible power outage or network problem).

 

3-3) The pixel visualization in Figure 3 (c) represents the status of all policies and activities by IP blocks. Those regions with dark blue or dark red pixels are regions of concern. Linking pixels with regions on the map can be considered transformation from the IP space to the Geo space.

 

 

 

 

MC 1.2  Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?

 

Figure 4: Pattern of Virus spreading.

 

Anomaly 1: Figure 4 shows the pattern of virus spreading. In (a) the virus spreading started from one machine ("#M", the number of infected machines is a) at 02-02, 12:45 BMT. As time went by, the virus was spread to others as shown in (b)and (c). Then the number of infected machines was increasing, we use KDE heatmap and the office scaling function to visualize the spreading condition. The scaling function resizes the offices representation (square) according to the number of machines reporting the policy violation that users selected. Figure (e) shows the last time slot in the data. We see that most offices are infected by viruses. Head quarters and data centers seem to have many infected computers as shown in Figure (f) where the darkest red in the status bar represents the time of virus infection of machines.

 

 

Figure 5: Unexpected offline statuses of machines.

 

Anomaly 2: In region 5, some offices are offline as shown in Figure 5 with black square dots. The phenomena started at 02-02-2012 12:15 BMT. Before 23:00 BMT, the number of offices increased and the pattern was that machines were offline from south of the region to northeast of the region. After 23:00 BMT, the number offices that had offline machines started decreasing. This might be explained by unexposed information such as a sudden power outage in the region.

 

 

 

Figure 6: Adding servers suddenly

 

 

Figure 7: Added servers in HQ-DC5

 

Anomaly 3: There were machines added at 02/02 18:00 BMT, which happened only one time. By examining offices in different regions, we find that Data center 5 added a lot of servers at that time period as shown in Figure 6 and Figure 7 where the adding is pointed by red arrows. In these Figures, y-axis means the number of machines and x-axis represents time.

 

 

 

Figure 8: Trend of increasing policy (top) while activities look normal (bottom)

 

 

Figure 9: Trend of the increasing number of infected machines

 

Anomaly 4: The number of machines violating policies is increasing as shown in Figure 8 (top). Because the higher policy number means the more severe problem, this increasing policy of machines implies a critical situation overall. Figure 9 shows the trend of the increasing number of infected machines based on different machine types. In Figure 8 and 9, y-axis means the number of machines and the x-axis represents time.

 

 

Figure 10: NoC (Number of Connection) grid view shows all C level ip blocks for the selected B level block. When there is a sudden jump in the number of connection, we consider it anomaly. As an example of anomaly, blue arrows are used in the first column.

 

Figure 11: Anomalies regarding numbers of connections

 

Anomaly 5: NoC (Number of Connection) grid view enables the user to exam all C level IP blocks (e.g., 172.32.56.x) in the selected B level IP block (e.g., 172.32.x.x) as shown in Figure 10. Machines in one C block belong to the same office and one office may have one or more C level blocks. In each C level block, the maximum number of connections (orange) and average number of connections (beige) are represented as shown in Figure 11 (b). In Figure 10, there are many C level blocks in the B level block (172.32.x.x, green arrow) that have unexpected numbers of connections during night time. Some example blocks are pointed by blue arrows in the first column. Figure 11 (b) shows one example C level block (172.8.117.x) with a sudden peak in the number of connections during night time.

Investigating this further, we can see computers in the selected C block in the status view as shown in Figure 11 (c). The status view visualizes all machines' history in the IP block, where we can find detailed information for the suspected machines. For example, three machines in the red rectangle have unexpected high number of connections and. A black rectangle in the view means the computer is offline for the period.

 

After the investigation, we found that there are four level B blocks that have this kind of anomaly. The 172.8 B level block contains only two C level blocks 172.8.117.x and 127.8.119.x. On the other hand, there are many C blocks showing this anomaly in 172.30, 172.31, and 172.32 B level blocks. The common fact is that all these blocks belong to Region 10 and this anomaly took place from 8:00pm to 11:59pm on Feb. 3rd. The affected computers are workstations used by tellers.